Principal Methods
5
ABC-Net [147] is another network designed to improve the performance of binary net-
works. ABC-Net approximates the full precision weight filter W with a linear combination
of M binary filters B1, B2, ..., BM ∈{+1, −1} such that W ≈α1β1 + ... + αMβM. These
binary filters are fixed as follows:
Bi = Fui(W) := sign( ¯W + uistd(W)), i = 1, 2, ..., M,
(1.11)
where ¯W and std(W) are the mean and standard derivation of W, respectively. For acti-
vations, ABC-Net employs multiple binary activations to alleviate information loss. Like
the binarization weights, the real activation I is estimated using a linear combination of N
activations A1, A2, ..., AN such that I = β1A1 + ... + βNAN, where
A1, A2, ..., AN = Hv1(R), Hv2(R), ..., HvN (R).
(1.12)
H(R) in Eq. 4.35 is a binary function, h is a bounded activation function, I is the
indicator function, and v is a shift parameter. Unlike the input weights, the parameters
β and v are trainable. Without explicit linear regression, the network tunes β′
ns and v′
ns
during training and is fixed for testing. They are expected to learn and utilize the statistical
features of full-precision activations.
Ternary-Binary Network (TBN) [228] is a CNN with ternary inputs and binary weights.
Based on accelerated ternary-binary matrix multiplication, TBN uses efficient operations
such as XOR, AND, and bit count in standard CNNs, and thus provides an optimal trade-
offbetween memory, efficiency, and performance. Wang et al. [233] propose a simple yet
effective two-step quantization framework (TSQ) by decomposing network quantization into
two steps: code learning and transformation function learning based on codes learned. TSQ
fits primarily into the class of 2-bit neural networks.
Local Binary Convolutional Network (LBCNN) [109] proposes a local binary convolution
(LBC), which is motivated by local binary patterns (LBP), a descriptor of images rooted
in the face recognition community. The LBC layer has a set of fixed, sparse predefined
binary convolutional filters that are not updated during the training process, a non-linear
activation function, and a set of learnable linear weights. The linear weights combine the
activated filter responses to approximate a standard convolutional layer’s corresponding
activated filter responses. The LBC layer often affords significant parameter savings of 9x
to 169x fewer learnable parameters than a standard convolutional layer. Furthermore, the
sparse and binary nature of the weights also results in up to 169x savings in model size
compared to a conventional convolution.
Modulated Convolutional Networks (MCN) [236] first introduce modulation filters (M-
Filters) to recover the binarized filters. M-Filters are designed to approximate unbinarized
convolutional filters in an end-to-end framework. Each layer shares only one M-Filter, lead-
ing to a significant reduction in model size. To reconstruct the unbinarized filters, they
introduce a modulated process based on the M-Filters and binarized filters. Figure 1.1 is an
example of the modulation process. In this example, the M-Filter has four planes, each of
which can be expanded to a 3D matrix according to the channels of the binarized filter. After
the ◦operation between the binarized filter and each expanded M-Filter, the reconstructed
filter Q is obtained.
As shown in Fig. 1.2, the reconstructed filters Q are used to calculate the output feature
maps F. There are four planes in Fig. 1.2, so the number of channels in the feature maps
is also 4. Using MCNs convolution, every feature map’s input and output channels are the
same, allowing the module to be replicated and the MCNs to be easily implemented.
Unlike previous work in which the model binarizes each filter independently, Bulat et al.
[23] propose parameterizing each layer’s weight tensor using a matrix or tensor decomposi-
tion. The binarization process uses latent parametrization through a quantization function